Prognostic nomogram for female patients suffering from non-metastatic Her2 positive breast cancer: A SEER-based study

This paper aimed at constructing and validating a novel prognostic nomogram, so that physicians forecast the overall survival (OS) rates of female patients suffering from non-metastatic human epidermal growth element receptor-2 (HER2) positive breast. Information of primary female her2 positive breast cancer patients without metastasis was obtained from the Surveillance, Epidemiology, and End Results (SEER) database with given inclusion and exclusion standards. Independent variables were obtained greatly by performing univariable and multivariate analyses. Based on those independent predictors, a novel prognostic nomogram was constructed for predicting the survival of those with 3- and 5-year OS. Then, concordance index (C-index), receiver operating characteristic curve (ROC), and calibration plot were adopted for the assessment of the predictive power of the nomogram. A total of 36,083 eligible patients were classified into a training cohort (n = 25,259) and a verification cohort (n = 10,824) randomly. According to the identification of multivariate analysis, survival was predicted by age at diagnosis, marital status, race, site, T stage, N stage, progesterone receptor (PR) status, estrogen receptor (ER) status, surgery, radiation, and chemotherapy independently. A nomogram was established by applying the training cohort. The nomogram displayed excellent discrimination and performance as indicated by the C-index (0.764, 95% confidence interval: 0.756–0.772), and the 3- and 5-year area under the curve of ROC (AUC) values (0.760 and 0.692 respectively). The calibration plots for forecasting the 3- and 5-year OS were in great agreement. The OS for female her2 positive breast cancer patients without metastasis was predicted by constructing a nomogram on basis of the SEER database. A precise survival prediction could be offered for each patient.


Introduction
Surpassing lung cancer, female breast cancer has been the most common cancer with approximately 2.3 million new cases (11.7%) on basis of GLOBOCAN 2020 estimates of cancer incidence and mortality issued by the International Agency for Study of Cancer. [1] In females, breast cancer is the most common cancer and the major cause of death. As a The datasets generated during and/or analyzed during the current study are publicly available.
The data that support the results of the present study are available in the Surveillance, Epidemiology, and End Results, https://seer.cancer.gov.
None. The SEER database is publicly accessible in the world, as a consequence, we did not provide the approval and informed consent of an institutional review committee in this study.
The authors have no funding and conflicts of interest to disclose. Medicine member of the epidermal development factor receptor family of transmembrane receptors, the human epidermal growth factor receptor-2 (HER2) receptor tyrosine kinase exerts important effects on both growth and cancer. [2,3] Amplification and overexpression of the HER2 gene are present in 20% to 25% of human breast cancer and are associated with poor clinical results. [3,4] HER2 protein overexpression is measured by immunohistochemistry (IHC)-based test. A positive HER2 test is defined as IHC3+ or by fluorescence in situ hybridization (FISH) measurement of a HER2 gene copy number of six or more or a HER2/CEP17 ratio of 2.0 or greater. [4] The majority of patients with HER2-positive breast cancer receive surgery and postoperative chemotherapy, while an increasing number also receive neoadjuvant chemotherapy. [5] Before HER2-directed therapies are available, females suffering from HER2-positive breast cancer typically exhibit shorter disease relapse, and grew incidence of metastases, resulting in a worse prognosis than HER2-negative breast cancer. [6] As anti-HER2 therapies are introduced to the treatment of patients suffering from HER2-positive breast cancer, great improvements in survival in both early and advanced settings have been led. Although survival of each patient has improved based on the emergence of these targeted therapies partly, the prognosis can be extremely variable. [7] It's not completely understood how the tumor characteristics and other patients' factors influence the treatment benefit and prognosis of HER2-positive breast cancer. By incorporating and illustrating important prognostic factors, physicians used nomograms to accurately estimate the prognosis of patients in time. [8] In this study, the female patients suffering from non-metastatic HER2 positive breast cancer were analyzed on basis of the Surveillance, Epidemiology, and End Results, US = United States (SEER) Program, including the clinicopathological characteristics and prognostic elements. A nomogram was further established and validated to predict the personal 3-and 5-year overall survival (OS) rates of female patients suffering from non-metastatic HER2-positive breast cancer.

Data source
The SEER database is one of the most authoritative databases from the National Cancer Institute of the United States (US) Institutes of Health (https://seer.cancer.gov/). Incidence, prevalence, and mortality information of cancer registries covering about 34.6% of the US population are collected by SEER currently. The identification of data for this research was made from the SEER.

Research population
Population information includes all females suffering from non-metastatic HER2 positive breast cancer and was extracted from the SEER Research Plus Data (18 Regs, Nov Sub 2000 with the SEER*Stat 8.3.9. The given standards for the SEER*Stat software for the identification of patients were shown below: female patients; "Breast" was confined to Site and Morphology (TNM 7/CS v0204 + Schema thru 2017); Originated M phase, American Joint Committee on Cancer (AJCC) M 7th ed (2010-2015) was limited to "M0"; Derived HER2 Record (2010+) was confined to "Positive"; and First malignant primary indicator was limited to "Yes".

Variables
For each case, SEER provided the data below: age at diagnosis, race, marital status, laterality, site, grade, T stage (AJCC, 7th ed.), N stage (AJCC, 7th ed.), breast cancer subtype, surgery, radiation, chemotherapy, survival months, and vital state. We excluded the following cases: unknown race or marital status, laterality is bilaterally or side unspecified, unknown AJCC T stage, N stage histological grade, estrogen receptor (ER) Status, progesterone receptor (PR) Status. The race category defined "Other" as Pacific Islander or Asian, American Indian/Alaska Native. In the marital status category, "Not married" Included Single, Divorced, Widowed, Separated, Unmarried, or Domestic Partner. The site category defined "Other" as Nipple/Central portion of breast/Axillary tail of breast/Overlapping lesion of breast/Breast, NOS. December 31, 2018, is the cutoff date of follow-up time. The publishing of the TNM stage system (AJCC stage group 7th edition) was performed in 2010, and the same year HER2 status in the SEER database only becomes available. Therefore, the actual date of follow-up for this version of the sub-database is 2010 to 2018.

Nomogram establishment
The eligible patients identified from SEER registries were randomly fallen at a ratio of 7:3 into a training cohort and a verification cohort. A nomogram was established by a training set. Independent prognostic variables were obtained greatly by performing univariable and multivariate analysis. Hazard ratios (HRs) were shown with their 95% confidence intervals (CIs). Based on those independent predictors, novel prognostic nomograms were constructed for predicting the survival of those with 3-and 5-year OS.

Verification of the nomogram
The internal verification and validation of the nomogram were conducted in the training cohort. While the external validation was performed in the verification cohort to evaluate the prediction efficiency. Harrell concordance index (C-index), and the area under the curve of ROC (AUC) were used to evaluate the discrimination of the nomogram. More precise prognostic predictions are indicated by a higher C-index or a higher AUC value. [8] An excellent discriminative capacity between 0.71 and 0.90 is shown by the C-index, while the C-index more than 0.90 displays higher precision. Similarly, the higher the AUC value is, the better the predictive ability of the nomogram will be. The nomogram performance was evaluated by the calibration plot. For a fully calibrated model, the calibration plot shall fall the forecasts at a diagonal 45° line.

Statistical exploration
The comparison of pathological and clinical features of the training and verification cohorts was made with the chi-square as suitable. The Kaplan-Meier approach was adopted to calculate the cumulative survival curves for each patient variable. Univariate and multivariate Cox regression explorations were applied to recognize the significant independent prognostic variables. Due to two-sided P values, values of <.05 were regarded statistically significant. IBM SPSS Statistics 26.0 (SPSS, Inc, Chicago, IL) was adopted to perform univariate and multivariate Cox analyses. R software version 4.0.3 (http://www.R-project.org) was employed to construct the nomogram, receiver operating characteristic curve (ROC), and calibration plots. There were survival and rms in the R package.

Clinicopathological features of the training and verification sets
Our research investigated 41,497 female HER2 positive breast cancer in total without metastasis cases according to the SEER database. Of these, 5414 patients were excluded because of inadequate data. Our analysis included the 36,083 eligible patients remaining, with 25,259 patients in the training set and 10,824 patients in the verification set. Table 1 presents the clinicopathological features of training and verification sets.

Separate prognostic elements in the training set and establishment of the nomogram
The nomogram was established with a training set. Table 2 displays univariate and multivariate explorations of hidden predictors for the OS. Age at diagnosis, marital status, race, site, grade, Table 1 The demographics and clinical features for female patients suffering from non-metastatic HER2 positive breast cancer in different cohorts.    T stage, N stage, breast cancer sub-type, PR status, ER status, surgery, radiation, and chemotherapy, were critically related to risk elements for the OS in the univariate exploration. Hence, multivariate exploration included the mentioned significant risk elements. According to the identification of multivariate analysis, age at diagnosis, marital status, race, site, T stage, N stage, PR status, ER status, surgery, radiation, and chemotherapy could predict survival (Table 2) independently. The nomogram for 3-, and 5-year OS (Fig. 1) was built with independent elements.

Nomogram validation
The training cohort was employed to internally validate the nomogram. Harrell C-index was 0.764 (95% confidence interval: 0.756-0.772) in the training set which indicates discrimination ability. To be similar, Harrell C-index was 0.757 (95% confidence interval: 0.743-0.771) in the external validation set. Moreover, the 3-and 5-year AUC values of the training set were 0.760 and 0.692, corresponding to 0.760 and 0.713 in the validation set (Fig. 2). According to these outcomes, the OS can be accurately predicted by the nomogram. The nomogram performance was evaluated by using the internal and external calibration plots. A worse prognosis was caused by the higher total points according to the sum of the appointed number of points for every recognized element in the nomogram. According to Figure 3, the calibration plots for predicting the 3-and 5-year OS in both the training excellently agreed with those in validation sets.

Discussion
According to univariable and multivariable Cox proportion hazards regression, we identified the correlated prognostic elements in the OS rate of female Her-2 positive breast cancer patients without metastasis. And the factors included age at diagnosis, marital status, race, site, T stage, N stage, PR status, ER status, surgery, radiation, and chemotherapy. We excluded metastatic disease as its treatment is generally delivered with palliative rather than curative intent. Meanwhile, we used the SEER database with a mean follow-up set up nomogram quantificationally predicting the 3-and 5-year OS rates by patient-associated and tumor-associated elements. Our findings can inform preventive and therapeutic strategies aimed at improving survival for these women.
Age at diagnosis is related to breast cancer survival. It is reported that approximately 5% of breast cancer are diagnosed in females who are younger than 40 years of age. [9] In fact, during the review time almost 90% of patients more youthful than 40 years old died from their breast cancer, contrasted and just 49% of patients 40 years old and more seasoned. Several studies had shown that younger breast cancers often exhibited more aggressive biological characteristics, such as estrogen receptor-negative (ER-) and HER2 positive tumor, high grade and are more prone to lymph node metastasis, resulting in higher recurrence and mortality rates. [10][11][12][13][14] Thus, in this study, we used an age cutoff of 40 years to perform the relationship between age and prognosis in her2 positive breast cancer patients. In our study, we noted that elderly women experience poorer outcomes, which conformed to past publications. [15][16][17] The reasons for the worse results noted among older patients were multifaceted. One important factor is that older patients tend to nonstandard treatment because of lower tolerance to surgery, chemotherapy, and radiotherapy. [18,19] In our analysis, marital status has a higher hazard ratio than age, race, receipt of radiation, grade, T stage (1-2), and N1 disease. And married patients show a better prognosis than others including single, divorced, widowed, separated, unmarried, or domestic partner in our manuscript. Marital status and ethnicity as the most important forms of social relations have been suggested as predictive factors for breast cancer survival. Married patients were found to have more emotional and financial support. They could be diagnosed at an earlier stage, received proper treatments, and prolonged their overall survival. [20][21][22] According to our findings, perhaps more attention should be paid to psychological and social support during the treatment of breast cancer patients. Contemporarily, many clinicopathological characteristics are taken into consideration for prognosis in breast cancer patients, such as site, grade, T stage, N stage, breast cancer sub-type, ER status, PR status. Previous studies displayed that anatomic site was a great independent element. Our study finds agreement with others that breast tumors in the lower outer quadrant (LOQ) have the best prognosis, [23] although tumors in the upper outer quadrant (UOQ) are generally considered to have the best prognosis. [24,25] Anatomical differences can significantly affect the development of tumor metastasis, which is very important for the prognosis of breast cancer. Metastases to the internal mammary nodes are difficult to detect on imaging, leading to inadequate diagnosis, and treatment. [26,27] Compared to more medial tumors, outer regions tumor showed better prognosis due to the lymphatic involvement was easily detected and  complete surgical management. [28] Besides, we find laterality was not regarded as a prognostic factor on OS rates of female Her-2 positive breast cancer without metastasis. However, B. Karatas F et al demonstrated left laterality was an independent prognostic factor for metastasis in N3 stage breast cancer. [29] And it is reported that left-side radiotherapy was associated with increased cardiac mortality. [30] The underlying mechanism of laterality and site on breast cancer outcomes need to be further explored. ER (+) and progesterone receptor-positive (PR (+)) have been regarded as protective elements for the prognosis of breast cancer in most past research. [31,32] It is argued that hormone therapy is effective for hormone-receptor-positive (HR+) patients, which provides broader therapeutic approaches for her2 positive patients. Control trials have reported that despite over-expression of the HER2 oncogene, hormone receptor state is still a great determinant of disease result, with more recurrences and deaths among females with the hormone-receptor-negative disease even after 11 years' median follow-up. [33] Interestingly, we found that ER (+) and PR (+) exhibited higher hazard ratio, yet HR+ subtypes showed no statistical significance in hazard ratio. By now, data on HR subtypes in her2 positive breast cancer patients are limited. Early studies reported that the HR+/HER2+ subtype was associated with better prognoses of breast cancer patients than the hormone-receptor-negative (HR-)/HER2+ subtype. [34,35] On the contrary, Bae et al [36] found no difference in OS of her2+ breast cancer patients in the comparison among four subtypes (estrogen receptor-positive [ER+]/[PR+], ER-/PR+, ER+/progesterone receptor-negative [PR-], ER-/PR-). Our study showed similar results. The authors showed that single HR+ subtypes (ER-/PR+, ER+/PR-) were not significantly associated with OS in HER2-positive breast cancer. These results indicated that HER2 expression and anti-her2 therapy may be more significant prognostic factors than single HR+ expression in HER2+ breast cancer.
AJCC T stage, AJCC N stage, were recognized prognostic elements for breast cancer, [37,38] and our study showed the same results. Using the 7th edition of the AJCC category to predict breast cancer prognosis is a traditional and classical protocol. The TNM grading system has proven to be an excellent tool for predicting breast cancer prognosis and guiding therapeutic selection worldwide. In terms of therapy, the approaches for her2 positive breast cancer include surgery, chemotherapy, radiotherapy, endocrine therapy, and anti-HER2 therapy. The wide application of the mentioned made contributions to reducing locoregional and distant recurrence, which can absolutely benefit the patients. In a meta-analysis by the EBCTCG, post-mastectomy radiotherapy for patients with axillary lymph nodes decreased the 10-year first recurrence rate by 10.6%, resulting in an 8.1% decrease in breast cancer mortality after 20 years. [39] The anti-HER2 therapy (mainly trastuzumab and pertuzumab) with chemotherapy led to dramatic improvements in the survival of patients suffering from HER2 positive breast cancer. [40] Luo et al [41] explored 1304 consecutive patients suffering from non-metastatic HER2 positive breast cancer and identified several independent prognostic elements to set up a nomogram. The authors utilized their clinical database as the training set, which did provide treatment details, and then capitalized on the SEER confirmation. But in our manuscript, we selected a huge patient population from the SEER database to construct a nomogram, which also displayed an excellent discrimination power to predict prognosis. The study of Luo et al and ours show consistency and support each other, making the results more convincing.
Although the nomogram displayed excellent discrimination and performance, our research contains certain restrictions. Firstly, other factors with certain guidance indications were not available from the SEER database, including the presence of surgical margin state, levels of Ki-67, and kind of chemotherapy. Secondly, the administration of anti-HER2 therapy and hormonal therapy were much beyond our accessibility. Thirdly, some variables and categorizing them as "others" might result in data bias. And it is required that the nomograms shall be the external validation by prospective cohort before the application to clinical practice since the study relies on historical information.

Conclusion
In conclusion, demographic and clinicopathological features were incorporated from a large population-based cohort to set up an efficient nomogram for predicting the prognosis of female patients suffering from non-metastatic HER2 positive breast cancer. With regard to the nomogram, clinicians can more accurately forecast individual overall mortality within 3 or 5 years, which will lay a foundation for subsequent administration methods.